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ON THE SPECTRAL NORM OF 
GAUSSIAN RANDOM MATRICES 


RAMON VAN HANDEL 
In memory of Evarist Gine 


Abstract. Let X he a dx d symmetric random matrix with independent but 
non-identically distributed Gaussian entries. It has been conjectured by Lataia 
that the spectral norm of X is always of the same order as the largest Euclidean 
norm of its rows. A positive resolution of this conjecture would provide a 
sharp understanding of the probabilistic mechanisms that control the spectral 
norm of inhomogeneous Gaussian random matrices. This paper establishes the 
conjecture up to a dimensional factor of order Vlog log d. Moreover, dimension- 
free bounds are developed that are optimal to leading order and that establish 
the conjecture in special cases. The proofs of these results shed significant 
light on the geometry of the underlying Gaussian processes. 


1. Introduction 

Let A be a symmetric random matrix with independent mean zero entries. If 
the variances of the entries are all of the same order, this model is known as a 
Wigner matrix and has been widely studied in the literature (e.g., [1]). Due to the 
large amount of symmetry of such models, extremely precise analytic results are 
available on the limiting behavior of fine-scale spectral properties of the matrix. 
Our interest, however, goes in an orthogonal direction. We consider the case where 
the variances of the entries are given but arbitrary: that is, we consider structured 
random matrices where the structure is given by the variance pattern of the en¬ 
tries. The challenge in investigating such matrices is to understand how the given 
structure of the matrix is reflected in its spectral properties. 

In particular, we are interested in the location of the edge of the spectrum, that 
is, in the expected spectral norm E||A|| of the matrix. When the entries of the 
matrix are i.i.d., a complete understanding up to universal constants is provided 
by a remarkable result of Seginer [6] which states that the expected spectral norm 
of the matrix is of the same order as the largest Euclidean norm of its rows and 
columns. Unfortunately, this result hinges crucially on the invariance of the dis¬ 
tribution of the matrix under permutations of the entries, and is therefore useless 
in the presence of nontrivial structure. It is noted in [6] that the conclusion fails 
already in the simplest examples of structured random matrices with bounded en¬ 
tries. Surprisingly, however, no counterexamples to this statement are known for 
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structured random matrices with independent Gaussian entries. This observation 
has led to the following conjecture proposed by R. Latala (see also [4, 5]). 

Throughout the remainder of this paper, X will denote the d x d symmetric 
random matrix with entries = bijgij, where {gij '■ i > j} are independent 
standard Gaussian random variables and {bij '■ i > j} are given nonnegative scalars. 
We write a < 6 if a < Cb for a universal constant C, and a x 6 if a < & and b < a. 


Conjecture 1. The expected spectral norm satisfies 


E||X||xE 



The lower bound in Conjecture 1 holds trivially for any deterministic matrix: 
if a matrix has a row with large Euclidean norm, then its spectral norm must be 
large. Conjecture 1 suggests that for Gaussian random matrices, this is the only 
reason why the spectral norm can be large. It is not at all clear, however, what 
mechanism might give rise to this phenomenon, particularly as the Gaussian nature 
of the entries must play a crucial role for the conjecture to hold. 

Recently, Bandeira and the author [2] proved a sharp dimension-dependent upper 
bound on ||X|| (we refer to [2] for a discussion of earlier work on this topic): 


Theorem 1.1 ([2]). The expected spectral norm satisfies 


E||X|| < max 

i 



max bij yjlogd. 
ij 


The combinatorial proof of this result sheds little light on the phenomenon de¬ 
scribed by Conjecture 1. Nonetheless, the right-hand side of this expression is a 
natural upper bound on the right-hand side of Conjecture 1 [2, Remark 3.16]. On 
the other hand, the terms in this bound admit another natural interpretation. A 
simple computation shows that the first term in this bound is precisely ||EA^||^/^, 
while the second term is an upper bound on Emax^ \Xij\. This suggests the fol¬ 
lowing alternative to Conjecture 1 that is also consistent with Theorem 1.1. 


Conjecture 2. The expected spectral norm satisfies 

E||A|| X ||EA2||i/2 + Emax|Ay|. 

1-3 

Once again, the lower bound in Conjecture 2 holds trivially (cf. [2, section 3.5]): 
the first term follows readily from Jensen’s inequality, while the second term follows 
as the spectral norm of any matrix is bounded below by the magnitude of its largest 
entry. Thus the two terms in the lower bound reflect two distinct mechanisms 
that control the spectral norm of any random matrix: a random matrix has large 
spectral norm if it is large on average (as is quantified by jjEA^jj^/^; note that the 
expectation here is inside the norm!), or if one of its entries is large (as is quantified 
by Emaxy \Xij\). Conjecture 2 suggests that for Gaussian random matrices, these 
are the only reasons why the spectral norm can be large. 

In many cases the improvement of Conjectures 1 and 2 over Theorem 1.1 is 
modest, as the latter bound is already tight under mild assumptions. On the one 
hand, if max^ b'^j > max^ b^j log d, then the first term in Theorem 1.1 dominates 
and therefore Ejj Ajj x jjEA^j]^/^ as predicted by Conjecture 2. On the other hand, 
if a polynomial number > d^ of entries Xki of the matrix have variance of the same 
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order as the largest variance bki ^ maxy bij, then Emaxy \Xij\ ^ maxy bij Vlog d 
and thus Theorem 1.1 also implies Conjecture 2. These observations indicate that 
Theorem 1.1 already implies Conjecture 2 when the matrix is “not too sparse”. 
Nonetheless, the apparent sharpness of Theorem 1.1 belies a fundamental gap in 
our understanding of the probabilistic mechanisms that control the spectral norm 
of Gaussian random matrices: the phenomena predicted by Conjectures 1 and 2 
are inherently dimension-free, while the assumptions under which Theorem 1.1 is 
tight exhibit nontrivial dependence on dimension. The resolution of Conjectures 1 
and 2 would therefore provide a substantially deeper insight into the structure of 
Gaussian random matrices than is obtained from Theorem 1.1. 

The aim of this paper is to develop a number of new techniques and insights that 
contribute to a deeper understanding of Gonjectures 1 and 2. While our results fall 
short of resolving these conjectures, they provide strong evidence for their validity 
and shed significant light on the geometry of the problem. 

We begin by observing that Gonjectures 1 and 2 are in fact equivalent, which 
is not entirely obvious at first sight. In fact, our first result provides an explicit 
expression for the right-hand side in Gonjectures 1 and 2 in terms of the coefficients 
bij. (A much more complicated expression in terms of Musielak-Orlicz norms can 
be found in [5], but is too unwieldy to be of use in the sequel.) 

Theorem 1.2. Conjectures 1 and 2 are equivalent: 


E 


max 




+ 'ETn&yi\X,. 


max b‘1^ + max b*^ \/\ogi, 


where the matrix {&))■} is obtained by permuting the rows and columns of the matrix 
{bij} such that maxj 6^ > maxj b^j > ■ ■ ■ > maxj 6^^ . 

As the bound of Theorem 1.1 appears to be tantalizingly close to the expres¬ 
sion in Theorem 1.2, one might hope that the latter could be established by a 
refinement of the methods that were developed in [2]. The proof of Theorem 1.1 
in [2] relies heavily on the moment method, which is widely used in the analysis 
of random matrices. This method is based on the elementary observation that 
II j^|| 2 p ^ Tr[A^^'] < (i|| A|pP for any d x d symmetric matrix X and p > 1, so that 

E[||^||2p]i/2p _ E[Tr[A2P]]i/2p for p _ fog^. 


The essential feature of the moment method is that the right-hand side of this 
expression is the expectation of a polynomial in the entries of the matrix, which 
admits an explicit expression that is amenable to combinatorial analysis. By its 
very nature, any proof using the moment method cannot directly bound E||X||; 
instead, this method bounds the larger quantity E[|| which is what 
is actually done in [2]. For the latter quantity, however, it is readily seen that the 
result of Theorem 1.1 is already sharp without any additional assumptions: 


E[||A|| 


log dll/ logd 


max 6^^ -I- max bij y^log d. 
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The upper bound is proved in [2], while the lower bound follows along the lines of 
Conjecture 2 from the estimate > ||EX^||^/^ + max^ ||iogd- 

We therefore see that the moment method is exploited optimally in the proof of 
Theorem 1.1, so that the resolution of Conjectures 1 and 2 cannot be addressed by 
the same technique that gave rise to Theorem 1.1. 

Nonetheless, by a slicing procedure that applies Theorem 1.1 separately at dif¬ 
ferent scales, we can already establish that Conjectures 1 and 2 hold up to a very 
mild dimensional factor. This is our second main result. 

Theorem 1.3. The expected spectral norm satisfies 


E 



< E||X|| < Vloglogd E 



While this result still exhibits an explicit dependence on dimension, the point 
of Theorem 1.3 is that the very mild dimensional factor -^/log log d is of much 
smaller order than the natural scale ^ -^/log d that appears in the sharp dimension- 
dependent bound of Theorem 1.1; in this sense. Theorem 1.3 could be viewed as 
providing significant evidence for validity of Conjectures 1 and 2. 

In the final part of this paper, we develop an entirely different approach for 
bounding the spectral norm of Gaussian random matrices. Unlike the methods 
developed so far, this approach is genuinely dimension-free and sheds significant 
light on the probabilistic mechanism that lies at the heart of Conjectures 1 and 2. 
The starting point for this approach is the elementary observation that 


E||X||=E 


sup |(u,Xu)| 

. V^B2 


is the expected supremum of a Gaussian process indexed by the Euclidean unit 
ball i? 2 - It is well known that such quantities are completely characterized, up to 
universal constants, by the geometry of the metric space {B 2 ,d), where 

d{v,w)‘^ := E[|('(;,Xu) — (w,2f'u;)p] 

is the natural metric associated with the Gaussian process (cf. [7]). Therefore, in 
principle, understanding the spectral norm of Gaussian random matrices requires 
“only” a sufficiently good understanding of the geometry of the metric space {B 2 ,d). 
To this end, we show that the geometry of {B 2 ,d) can be related to the Euclidean 
geometry of certain nonlinear deformations of the unit ball. The geometric structure 
exhibited by this mechanism appears to almost resolve Conjectures 1 and 2, but 
we do not know how to optimally exploit this structure. Even a crude application 
of this idea, however, suffices to prove a nontrivial dimension-free bound. 

Theorem 1.4. The expected spectral norm satisfies 


nxw < 




max 


ll/4 




\Aogi 
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It is instructive to compare this bound with the expression in Theorem 1.2. Using 
2-\/a6 < a + 6, it is readily seen that Theorem 1.4 implies the bound 


E||X|| < max 

i 




max 



< max 

i 



While this estimate falls slightly short of the conjectured optimal bound of Theo¬ 
rem 1.2 (due to the wrong power on the logarithm), it is dimension-free precisely 
in the expected manner. Together with the natural geometric structure exhibited 
in the proof, this provides further evidence for the validity of Conjectures 1 and 2. 
The result of Theorem 1.4 is complementary to Theorem 1.1: while Theorem 1.1 
is often sharp. Theorem 1.4 can give a substantial improvement for highly inho¬ 
mogeneous matrices. For example. Theorem 1.4 readily implies the dimension-free 
bound of Latala [4], which could not be reproduced using Theorem 1.1. 

The statement of Theorem 1.4 was chosen for sake of illustration; it is in fact 
a direct consequence of a sharper bound that arises from the proof. This sharper 
bound both improves somewhat on Theorem 1.4 for arbitrary matrices, and is able 
to establish the validity of Conjectures 1 and 2 in certain special cases. For ex¬ 
ample, we will establish these conjectures under the assumption that the matrix of 
variances {b^j} is positive definite or has a small number of negative eigenvalues. 
While these special cases are restrictive, they emphasize that the underlying geo¬ 
metric principle is not yet exploited optimally in the proof. The elimination of this 
inefficiency provides a promising route to the resolution of Conjectures 1 and 2. 

The ideas described above are developed in detail in the sequel. Our main results, 
Theorems 1.2, 1.3, and 1.4, are proved in sections 2, 3, and 4, respectively. 


2. Gaussian estimates 

The aim of this section is to prove Theorem 1.2. We will, in fact, consider an 
additional quantity beside those that appear in Conjectures 1 and 2. Let gi,...,gd 
be independent standard Gaussian variables, and consider the quantity 


E 


max 




^ 5 ? 


This quantity will appear naturally from the geometry that is to be developed 
in section 4 below. The maximum is taken here over random variables with the 
same distribution as in Conjecture 1 (note that these quantities differ only in that 
b^jg'j is replaced by however, in the above quantity these variables are 

dependent, while the maximum is taken over independent variables in Conjecture 1. 
Nonetheless, these quantities prove to be of the same order. The equivalence of 
the various quantities considered below indicates that the phenomena described 
by Conjectures 1 and 2 can appear in many different guises, providing us with 
substantial freedom in how to approach the proof of these conjectures. 
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Theorem 2.1. The following quantities are of the same order: 


E 


max 




E 


max 


EX2||1/2 + Emax|X,, 


X max y E \/logj, 

where we recall that the matrix {b*j} is obtained by permuting the rows and columns 
of the matrix {bij} such that maxj 6^ > maxj b 2 j > ■ ■ ■ > maxj 6^^ . 

Remark 2.2. The proof of the upper bound in Theorem 2.1 in fact yields 


E 


max 




< max C max b*j \/\og{i + 1) 


for a universal constant C (that is, the constant in front of the leading term is one). 
This is used in section 4 to prove Theorem 1.4 with an optimal constant. 

The proof of Theorem 2.1 is based on elementary estimates for the maxima of 
(sub-)Gaussian random variables with inhomogenenous variances. 


2.1. Gaussian m 2 Lxima. We begin by recalling a standard upper bound on the 
maximum of sub-Gaussian random variables, cf. [7, Proposition 2.4.16]. 

Lemma 2.3. Let Xi ,..., be not necessarily independent random variables with 

P[Xi > x] < Ce~^ for all x >0, i, 

where C is a universal constant and ai > 0 are given. Then 


E 


maxEi 

i<.n 


< max 

i<.n 


where <jI > <72 > ■ ■ ■ > is the decreasing rearrangement o/ cti ,..., cr„. 

The essential tool in the proof of Theorem 2.1 is that the result of Lemma 2.3 
can be reversed when the random variables are independent and Gaussian. 

Lemma 2.4. Let Xi, ..., be independent with Xi ^ N{0, a^). Then 


E 


max \Xi\ 

i<.n 


> max cr*\/log(z + 1), 

i<.n 


where <tI > U 2 > ■ ■ ■ > cr'f is the decreasing rearrangement 0 / ui,..., cr„. 


Proof. By permutation invariance, we can assume that ai are nonincreasing in i 
(so that Ui = a*). Fix j > 1 and let gi = Xijai. Then \Xi\ > aj\gi\ for all i < j 
and gi,... ,gj are i.i.d. standard Gaussian variables. We therefore have 


max \ Xi\ 

— E ® 

max \c 

i\ 



. ^<j 



> aj v'log(j + 1), 


where we used that the maximum of j i.i.d. standard Gaussian variables is of order 
Y^log(j + 1) [7, Exercise 2.2.7]. It remains to take the maximum over j < n. □ 
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2.2. Proof of Theorem 2.1. Let us begin by writing 

E 


max 




< maxE 


E^ 


+ E 


■ y 3 

max 


./Et 


By Jensen’s inequality, we have 


maxE 


<m„ 


On the other hand, by Gaussian concentration [3, Theorem 5.8], we have 


/Et5-e , Et 


> t 


for every i <n and t > 0. We therefore obtain 


E 


max 


lYl ^ H ^ + 1) 


by Lemma 2.3, where C is a universal constant. As 


max b*j v^log(7 + 1) < max b\j + max b*j \/\ogi < max bf^ + max b*j \/logi. 


we have shown 


E 


max 




< 


max 


E^: 


o ' max6yViogz 


(this last step is irrelevant to our results and is included for cosmetic reasons only). 
Next, we note that 

2 



+ Var 


/Et 


+ max 6? ■ < E 




where we have used the Gaussian Poincare inequality [3, Theorem 3.20]. Therefore, 

>mj« /y>j=||E.Y"||‘T 


E 


max 


On the other hand, we trivially have 


E 


max 


E^ 


> E max \Xi. 
ij 


Averaging these bounds gives 

llEA^l]!/^ +EmaxlAyl < E 


max 


/Et 
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In the opposite direction, for every choose j{i) such that = maxj bij. Then 


E max|Xij| > Emax|Xy(i)| > max6* \/logi 

ij i ’ ij 

by Lemma 2.4. Putting together the above bounds, we have shown that 
max /y^ 6^^- + max b*j y'logi < || EX^ + E max |Xy | 

\ . '^3 '^3 

V ^ 

< E 


max 


E 


X2 


< max y X/ 

This establishes the equivalence between Conjectures 1 and 2. 

It remains to consider the second quantity in Theorem 2.1. The upper bound 


E 




< 


max + max feb y^logi 


and the lower bound 


E 


max 


/EoM 


are obtained by repeating verbatim the corresponding arguments for the first quan¬ 
tity in Theorem 2.1. On the other hand, we can now estimate 


E 


max /y^ > E max bij\gj \ > max 6* \/logi 

i \ • ^ J J j A j A J 


by Lemma 2.4. Averaging these bounds completes the proof. 


□ 


3. Slicing 


The aim of this short section is to prove Theorem 1.3. The lower bound is trivial, 
and therefore by Theorem 1.2 it remains to prove the following. 


Theorem 3.1. The expected spectral norm satisfies 


E||X|| < v^loglogd 



This result will be established by slicing the matrix into ~ log log d pieces at 
different scales, each of which is bounded separately using Theorem 1.1. 

It proves to be convenient for the present purposes to work with matrices with 
independent entries that are not symmetric (as opposed to symmetric matrices, 
for which Xij = Xjt are not independent). To this end, let us cite the following 
non-symmetric variant of Theorem 1.1, see [2, Theorem 3.1] and its proof. 
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Theorem 3.2 ([2]). Let Z be the di x d 2 matrix whose entries Zij ^ N{0,c'^j) 
independent Gaussian variables. Then the expected spectral norm satisfies 

E[||Z|p]^/^ < max cfj + max J'Y', cfj + max Cij . 


We can now proceed to the proof of Theorem 3.1. 


are 


Proof of Theorem 3.1. By permuting the rows and columns of X if necessary, we 
can assume without loss of generality in the sequel that bij = b*y 

We begin by decomposing the matrix X = X^ + X^ into its parts above and 
below the diagonal: that is, Xj- := Xijli^j and X^ := Xijli>j. As 

E||A|| < E||A^|| + E||A'^|| < 2E||A'^|| 

(the second bound follows by Jensen’s inequality), it suffices to bound E||A1-||. 

We now decompose X^ into N := |"log 2 log 2 d] horizontal slices as follows: 

N 

X^ = 

n—1 


with 

X[f := A,^^.1,<4, X\f := 122-1 <,< 22 n for 2 < n < W 


Each matrix has independent entries, and the only nonzero entries of this 

matrix are contained in its upper 2^ x 2^ block. Moreover, 


||A^f = ||A-‘-*A'^|| 


N 




N 

n—1 


We therefore have 

E||A|| < 2/AmaxE[||A(”)f]^/2. 

n<N 

We now apply Theorem 3.2 to estimate each term |p]. Define the quantities 


(7 := max 

i 



r := max 

'i-3 


6-jVio^. 


As we assumed that bij = b*j, it follows immediately that 

r 

bij < —j== foralH,j. 

Vlogz 

In particular, this implies that for n > 2 

On the other hand, the sum of the variances of the entries in any row or col¬ 
umn of is clearly still bounded by a^. Finally, as noted above, A^”^ is a 

2^ X 2^ -dimensional matrix (we can remove all vanishing rows and columns with¬ 
out decreasing the norm). Applying Theorem 3.2 yields for every 2 < n < N 


E[||a(")||2]1/2 < ^ 2-"/2rv'log22'‘ < cr + r. 
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On the other hand, applying Theorem 3.2 with di = ^2 = 4 immediately yields the 
analogous bound for We therefore finally obtain 

E||X|| < ^^N{a + V), 

which completes the proof. □ 

The proof of Theorem 3.1 does not really contain a new idea: it follows directly 
from the dimension-dependent bound of Theorem 3.2 by applying it in a multiscale 
fashion. The problem with this approach is that while we engineered the slices so 
that Theorem 3.2 is sharp on each slice, substantial loss is incurred in the estimate 

N 

llXlf 

n=l 

that is, when we assemble the slices to obtain the final bound. To illustrate this loss, 
consider the case where X is a diagonal matrix with bn = (log(i -I- 1))“^/^. Then 
it is easily seen that in fact HXlp = max„ ||X(")||^, while every term 
is of comparable magnitude. We therefore see in this example that the residual 
dimension-dependence in Theorem 1.3 is incurred entirely in the above estimate. 

Notice that in contrast to the above estimate, we have the exact identity 

N 

||xJ-f = sup^||X(")uf. 

The previous estimate is sharp when each term in the sum is simultaneously maxi¬ 
mized by the same vector v. As the matrices are independent and have vastly 
different dimensions and scales, it seems particularly unlikely that this will be the 
case. If it were possible to show that in fact HXl-ll^ Ri max„ ||X("l|p holds in the 
general setting, then the slicing method could be adapted to prove Conjectures 1 
and 2. However, it is far from clear how this idea could be made precise, and it 
appears that the residual dimension-dependence in Theorem 1.3 cannot be further 
reduced without the introduction of a genuinely new idea. 


4. Geometry 


The aim of this section is to exhibit a very useful mechanism to control the 
geometric structure of the Gaussian processes associated to Gaussian random ma¬ 
trices. A direct application of this mechanism gives rise to dimension-free bounds 
on the spectral norm of Gaussian random matrices that can improve significantly 
on Theorem 1.1 for highly inhomogeneous matrices. Let us begin by formulating a 
general result that can be obtained by this method, from which Theorem 1.4 and 
a number of other interesting consequences will follow as corollaries. 


4.1. A general result. In the sequel, we will denote by B the d x d symmetric 
matrix of variances of the entries of X, that is, Bij := b^j. We denote by B^ and B~ 
its positive and negative parts, respectively; that is, ii B = \uiU* is the spectral 
decomposition of B, then ^ 0)uiU* and B~ := — ^ 0)uiU*. 

Theorem 4.1. Let Y ~ N{0,B~) be Gaussian with covariance matrix B~, and 
let gi,... ,gd ^ A^(0,1) be i.i.d. standard Gaussian variables. Then for any 7 > 0 


E||X|| < \/2 + 7 + 7-i E 


max 




1^1 


y/l E 


maxYi 

i 


2 max bij. 
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As a first consequence, we deduce a sharp form of Theorem 1.4. 
Corollary 4.2. There is a universal constant C such that 


E||X|| < 2max TCmax 


-I 1/4 


\/iog(^n). 


Proof. As + [B )^, we have 


Therefore, 


YaviY,) = Bt: < 

for every i, and Lemma 2.3 gives 


E 


max Li 


< 


max 


-|l/4 

4 \/log(i + 1). 


On the other hand, by Remark 2.2, we have 


E 


max 






< max lE bT + C' max b*j \/log(* + 1) 


< max /^ 62 ^. + C'max ^ 6 : 


1/4 


\/log(i + 1 ) 


for a universal constant C'. Now apply Theorem 4.1 with 7 = 1 . 


□ 


Let us note that the leading term in the first inequality of Corollary 4.2 is sharp. 
To see this, consider the example of a Wigner matrix where bij = 1 for all i,j. 
Then the first inequality yields E||X|| < 2Vd + o{d), which precisely matches the 
exact asymptotic 11X11 ^ 2Vd as d —>■ 00 [1, Theorem 2.1.22]. On the other hand, 
the second term in this inequality is suboptimal, as can be seen by considering the 
example where R is a band matrix with bij = 1 inside a diagonal band of width 
^ i/logd and bij = 0 outside the band (compare the conclusion of Corollary 4.2 
with that of Theorem 1.1). In cases such as the latter example where the second 
term dominates, Corollary 4.2 can be improved slightly by optimizing over 7 . 

Corollary 4.3. The expected spectral norm satisfies 


EIIXII < 


max 



max 6 * \/logi + 


1 - 

1/2 



1/4 1 

max / ^ bT + max b*^ \/log i 


max 

i 

E‘l 

^3 ^ 

\/logi 


Proof. Apply Theorems 4.1 and 1.2 and optimize over 7 > 0. 


□ 

































12 


RAMON VAN HANDEL 


Despite the suboptimal nature of the second term in Corollaries 4.2 and 4.3, 
these results can improve significantly on Theorem 1.1 for highly inhomogeneous 
matrices. To illustrate this, let us use Corollary 4.2 to derive a delicate (but much 
less sharp) result of Latala [4] that could not be recovered from Theorem 1.1. 

Corollary 4.4 ([4]). The expected spectral norm satisfies 

Proof. We may assume without loss of generality that the rows and columns of X 
have been ordered such that is nonincreasing in i. Then we must have 

j ij 

for all i, and the conclusion follows readily from Corollary 4.2. □ 


The above corollaries are based on a rather crude estimate on the variance of the 
random variables 4) that appear in Theorem 4.1 (see the proof of Corollary 4.2). Un¬ 
fortunately, it seems that this estimate cannot be significantly improved in general, 
which indicates that there is genuine inefficiency in the proof of Theorem 4.1. The 
apparent origin of this inefficiency will be discussed in some detail in the sequel. It is 
interesting to note, however, that there are certain special cases where Theorem 4.1 
already provides substantially better results than is suggested by Corollary 4.2. For 
example. Theorem 4.1 immediately resolves Conjecture 1 (with optimal constant!) 
under the strong assumption that the matrix of variances B is positive semidefinite. 


Corollary 4.5. If B is positive semidefinite, then 
EIIXII < 2E max 




-|- 2 max bij. 
ij 


Proof. In this case i? = 0, and the result follows from Theorem 4.1 with 7 = 1. □ 


Along similar lines, it is not difficult to see that if B has at most k negative 
eigenvalues, than the conclusion of Conjecture 1 holds with a constant that depends 
on k only (so that the conjecture is established if B has 0(1) negative eigenvalues). 
On the other hand, there are other cases where the special structure of B makes it 
possible to deduce Conjecture 1 from Theorem 4.1. For example, if 


B = BP 


0 1 
1 0 


where B' is a positive semidefinite matrix, then 





so that XarfYi) < i maxj bjj for all i; then arguing as in the proof of Corollary 4.2 
and applying Theorem 1.2 immediately yields Conjecture 1. All of these special 
cases are restrictive; however, they emphasize that the approach developed in this 
section can already extend significantly beyond the result of Theorem 1.4. 
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4.2. Proof of Theorem 4.1. We begin by recalling that 


E||X||=E 


sup |(z),Xz;)| 

. V^B2 


is the expected supremum of a Gaussian process indexed by the Euclidean unit 
ball 82 - It is well known that the supremum of a Gaussian process is intimately 
connected with the geometry defined by the associated (semi) metric 

d{v,w)^ := E[|('(;,Xu) — {w,Xw)\^]. 


The difficulty we face is to understand how to control this rather strange geometry. 

To motivate the device that we will use for this purpose, let us disregard for the 
moment the natural metric d and consider instead a simpler quantity, the variance 
of the Gaussian process. We can easily compute 

E[(i;, Xvf] = 2 X! ^ ^ 

i ij 

We now observe that this expression can be reorganized in a suggestive manner. 
Define the norm || • ||i on and the nonlinear map a; : —>■ as 

ll^ll* ■= ■= ^^*11^11*’ 

3 

and consider a second Gaussian process 

{x{v),g) ■.= ^Xi(v)g, 

i 

where gi,... ,gd are i.i.d. standard Gaussian variables. Then 

y 

In particular, we see that the variance of the Gaussian process {{v,Xv)'\v^b 2 as¬ 
sociated with our random matrix is dominated up to a constant by the variance 
of the Gaussian process {{x{v),g)}v^B 2 - The latter process is precisely what we 
would like to obtain in our upper bound, as we immediately compute 


= max 

i 

using the Gauchy-Schwarz inequality and the fact that the map v 1 —>■ (vf) maps the 
Euclidean unit ball in R'^ onto the d-dimensional simplex. 

Unfortunately, an inequality between the variances of Gaussian processes does 
not suffice to control the suprema of these processes. What is sufficient, however, 
is to establish such an inequality between the natural distances of these Gaussian 
processes: if we could show that the natural distance of the Gaussian process 
{(v, Xv}}y^B 2 is dominated by the natural distance of {(x(v), g}}y^B 2 ^ i®; 

? 

d{v,w) < ||a:(u) — a:(ic)||. 


sup {x{v), g) < sup 
v£B2 ii£B2 




then the conclusion of Gonjecture 1 would follow immediately from the Slepian- 
Fernique lemma [3, Theorem 13.3]. Unfortunately, this inequality does not always 
hold, see section 4.3 below. However, it turns out that such an inequality nearly 
holds, and this is the key device that will be exploited in this section. 
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Lemma 4.6 (The basic principle). For every v,w and 7 > 0 
d{v, < (2 + 7 + 7 -^) ||a:(?;) - - 7 

ij 


Proof. We first compute d{v,w)\ 
d{v, w)'^ = E[(i; + w, X{v — w))^] 



= Y^ivf - wffbl + 

i i>j 

= Y^{vi + Wifb1^(vj - Wjf + '^{vf - wl)b1^(v] - w]). 
ij 

We can now estimate using the triangle inequality ||r’ + u'||i<||f||i + ||w||i 
d{v,wf = Y^ivi - Wif\\v + wWi + ^(i;2 - wDbfjiv] - w]) 

i i^j 

^ + Iklli)^ + ~ “ ^i) 

i ij 

= + Ikld)^ 

i 

+ XI+ lklli)(i'i + ^«*)dkld - IkllO 

i 

= 2^(i;i - Wi)(||i^||z + lkld)(?^*lkl|z - Wzikll,) 

i 

= 2||x(i;) - a;(w)|p + 2^(z;i||r(;||i - Wi||z;||i)(i;i||?;||i - i(;i||w||i)- 
i 

The elementary inequality 2ab < 70 ^ + 7 “^ 6 ^ gives 

d{v,wf < (2 + 7"^)||a;('u) - a;('u;)f + 7X('^*II'^II* “ 

i 

for any 7 > 0. We now compute 
^(willwlli - u;i||z;||i)^ 

i 

= 2 X - ‘^{x{v),x{w)) 

ij 

= Ik(^) - xiw)f + 2X - X“ X 

ij ij ij 

= lk(^^) - x{w)f - Yi^i - w^)b‘^^jiv‘^ - wf). 
ij 

Combining these bounds completes the proof. □ 

With Lemma 4.6 in hand, we can now easily complete the proof of Theorem 4.1. 

Proof of Theorem 4-F We begin by noting that the spectral norm of a symmetric 
matrix is the largest magnitude of its maximal and minimal eigenvalues, that is, 

||X|| = sup |('y,X'(;)| = sup(w,Xw)V sup (i;, (—X)u). 

v£B2 VGB 2 V^B2 
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As X and —X have the same distribution, we can estimate 


E||X|| < E 

< E 


sup (v, Xv) 

_V^B2 

sup {v,Xv) 
_v£B2 


V2V( 


ar 


sup {vj Xv) 

.VGB2 


1/2 


+ 2 max bij, 


where we used the Gaussian Poincare inequality [3, Theorem 3.20] in the second 
inequality. To proceed, assume without loss of generality that Y ^ N{0,B~) is 
independent of 51 ,... ,gd, and define the Gaussian process {Zy}y^B 2 as follows: 


Zy := v's + y + b ^{xiv),g) + 

i 


The natural distance of this Gaussian process satisfies 


E[|F^- 2 '^P] = ( 2+7 + 7 > d{v,wf 

ij 

by Lemma 4.6. We therefore obtain 


E 

sup {v,Xv) 

< E 

sup Zy 


_vGB2 


,vGB2 


by the Slepian-Fernique inequality [3, Theorem 13.3]. A simple application of the 
Gauchy-Schwarz inequality as discussed before Lemma 4.6 completes the proof. □ 


4.3. Discussion. It is instructive to discuss the geometric significance of the basic 
principle described by Lemma 4.6. The clearest illustration of this device appears in 
the setting of Corollary 4.5 where the matrix of variances B is positive semidefinite. 
In this case, the second term in Lemma 4.6 is nonpositive, and we obtain 

d{v,w) < 2||x(u) — x(r(;)||. 

This inequality maps the geometry of the metric space {B 2 ,d) onto the Euclidean 
geometry of the nonlinear deformation of the unit ball 

i?* := {ic(i') : V G B2}, 

which is much easier to understand. 


Example 4.7. The trivial case of this construction appears in the example of 
a Wigner matrix where = 1 for all i,j. In this special case, the nonlinear 

deformation has no effect and B^ = B 2 is simply the Euclidean unit ball. Applying 
the Slepian-Fernique inequality in this setting shows that 


e||a:|| < E 


sup {t, g) 

t£B* 


E||5||<^. 


This idea is not new: our approach reduces in this trivial setting to the well-known 
method of Gordon for estimating the norm of Wigner matrices [ 8 , section 5.3.1]. 
However, the crucial insight developed here is that the geometry of changes 
drastically when we depart from the simple setting of Wigner matrices, which is 
not captured by Gordon’s method. This is illustrated in the following example. 
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B= 


1/8 1 \ 
1 1 / 8 ; 



Figure 4.1: Various possible shapes of the deformed ball = {x{v) : v € B 2 } 
are illustrated in the two-dimensional case d — 2. Note that the matrix B is 
positive semidefinite in the first three examples but not in the fourth example. 


Example 4.8. Consider the example of a diagonal random matrix where bij = 
li=j. In this case, the nonlinear deformation transforms the Euclidean unit ball 
into the fi-ball — Bi, whose geometry is entirely different than in the previous 
example. Applying the Slepian-Fernique inequality in this setting shows that 


E||A|| < E 


sup (t, 5 ) 


Ellffiloo < \/log d, 


which captures precisely the correct behavior in this setting. 


In general, the deformed ball i?* can take very different shapes, as is illustrated 
in Figure 4.1. The beauty of this construction is that the manner in which the 
geometry of the space (i? 2 ,d) is captured by the geometry of (S*, || • ||) provides a 
clear mechanism that gives rise to the phenomenon predicted by Conjecture 1. 

Unfortunately, the simple geometry exhibited above is much less clear when the 
matrix B is not positive semidefinite. One might hope that the inequality 

7 

d{v,w) < ||ai(u) — x(w)|| 

remains valid in the general setting, but this is not always true. The following 
illuminating example was suggested by Afonso Bandeira. 

Example 4.9. Let a,b ,6 > 0 {a ^b) and let 



We readily compute 

d{v, w)'^ = 6{a^ — b^Y, 

while 

||a;(u) — a:(u;)||^ = {a\/Sa^ + 6 ^ — b\/Sb"^ -\- a?Y ^ — 5^)^/4a^6^ as d j, 0 . 

Thus the ratio 

can be arbitrarily large. This example is essentially the worst possible, as optimizing 
7 in Lemma 4.6 shows that d{v,w)'^ < inaxy bij ||a;(v) — a;(w)|| for v^w € B 2 . 
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Remark 4.10. While the above example illustrates conclusively that the desired 
inequality cannot hold in general when B is not positive semidefinite, we also note 
that the failure point in this example appears to be very special. The vectors v 
and w, while far apart in the Euclidean distance, satisfy both d{v,w) = 0 and 
||a:(i;) — a:(u')|| = 0 when (5 = 0. These points are therefore in some sense “singular” 
with respect to the geometry of {B 2 , d) and of (B*, || • ||) when (5 = 0. Example 4.9 
shows that the comparison between the two geometries can fail near such singular 
points. Numerical experiments suggest that such points are rather rare and that 
the inequality d{v,w) < 2||ai(i;) — a;(ii;)|| typically fails only in a very small subset 
of the unit ball. We do not have a precise formulation of this idea, however. 

The phenomenon that is illustrated by Example 4.9 is controlled in Lemma 4.6 
by the addition of a second term that dominates the bound at the singular points 
of the geometry of {B 2 ,d). The remarkable aspect of this second term is that it 
has a very suggestive interpretation: if the matrix —B were positive semidefinite 
(which of course cannot be the case as B has nonnegative entries), this would be 
the natural distance corresponding to Gaussian process defined by the convex hull 
of random variables Ui,... ,Ud with U ^ 5V(0, —B). By Lemma 2.3, the supremum 
of this Gaussian process would be of the same order as the second term of the last 
expression in Theorem 1.2, which would suffice to establish Gonjecture 1. 

While this intuition clearly cannot be implemented in this manner, it is nonethe¬ 
less highly suggestive that the validity of Gonjecture 1 can “almost” be read off 
from the geometric structure described by Lemma 4.6. Unfortunately, we do not 
know how to optimally exploit this geometric structure. In Theorem 4.1, we have 
crudely forced —B — B~ — B+ to be positive definite by estimating it from above 
by B~. The problem with this approach is that the entries of B~ can be much 
larger than the entries of B, which is the origin of the suboptimal second term in 
Gorollary 4.2: there can in general be significant cancellation between B~ and B^ 
that our approach fails to exploit. The elimination of this inefficiency in the proof 
of Theorem 4.1 would be a significant step towards the resolution of Conjecture 1. 

We conclude by noting that there is no reason, in principle, to expect that 
a sharp bound on the expected supremum of a Gaussian process can always be 
obtained using the Slepian-Fernique inequality, as we have done in the proof of 
Theorem 2.1. In general, the connection between the supremum of a Gaussian 
process and the underlying geometry is described by the generic chaining method 
[7]. Unfortunately, even a geometric description along these lines of the trivial 
behavior of the supremum of a Gaussian process over a convex hull remains a 
long-standing open problem [7, pp. 50-51], so that a direct application of generic 
chaining methods in the present setting appears to present formidable difficulties. 
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